“A collection of concise but detailed information about a particular subject”
A comprehensive archive summarizing your project (data/analysis/code).
A few simple rules
for you and others
so that you can share your code, data, results with your students, supervisors, collaborators and/or the scientific community
The problem “I have my own organisation”
Here enters …
the ‘Research Compendium’
le but ?
The goal of a research compendium is to provide a standard and easily recognisable way for organising the digital materials of a project to enable others to inspect, reproduce, and extend the research.
Three Generic Principles
1 projet = 1 folder = 1 compendium
e.g. with RStudio: use Rstudio projects
│
├── [my_project]
│ └── my_project.Rproj
│
├── [another_project]
│ └── another_project.Rproj
│
├── [again_another_project]
│ └── again_another_project.Rproj
│
Stop to setwd() !!
Absolute paths (e.g. C:\\Albert\Bureau\PhD) only work on your computer (and not on others).
Use relative paths defined from the root of the project: e.g. outputs/01_datacleaned.csv, data/data_raw.csv
Use the package {here}
Use the package {here}
Data files, code files and output files are separated.
This separation is materialized by folders.
.
├── my_project.Rproj
├── [data]
└── [outputs]
Data files, code files and output files are separated.
This separation is materialized by folders.
.
├── my_project.Rproj
├── [data]
└── [outputs]
Implications
Keeping data and method separate treats the data as “read-only”, so that the original data is untouched and all modifications are transparently documented in the code.
The output files should be considered as disposable, with a mindset that one can always easily regenerate the output using the code and data.
- The analysis flow (the methods) is split into reusable pieces (fonctions), which are called by analyses scripts:
.
├── my_project.Rproj
├── [data] (raw data)
├── [R] (fonctions = small pieces of reusable code)
├── [analyses] (scripts)
└── [outputs] (results)
- The analysis flow (the methods) is split into reusable pieces (fonctions), which are called by analyses scripts:
.
├── my_project.Rproj
├── [data]
├── [R]
├── [analyses]
└── [outputs]
Careful in R
The folder R should only contain .R files which contain function definitions. Any call in the folder Rwill be executed when calling devtools::load_all() or targets::tar_source().
.
├── my_project.Rproj
├── [data]
├── [R]
├── [analyses]
│ ├── 00_setup.R (load packages, global variables)
│ ├── 01_data.R (read and format data)
│ ├── 02_length-weight.R (first analysis)
│ ├── 03_plot-length-weight.R (generate first plot)
│ ├── ...
└── [outputs]
############################################################
#
# 00_setup.R: load packages, set global variables
#
############################################################
The relationship between which code operates on which data in which order to produce which outputs must be specified as well.
Use a main script (make.R) which executes the different steps in the right order (it’s the only R script at the root of the folder!)
.
├── my_project.Rproj
├── [data]
├── [R]
├── [analyses]
├── [outputs]
└── make.R
make.R) which executes the different steps in the right order (it’s the only R script at the root of the folder!) .
├── my_project.Rproj
├── [data]
├── [R]
├── [analyses]
└── [outputs]
.
├── my_project.Rproj
├── [data]
│ ├── [raw_data]
│ └── [derived_data]
├── [R]
├── [analyses]
├── [figures]
└── [outputs]
Flexibility
Depending on your project, the corresponding organisation might be more or less complex. Adapt the compendium to your needs.
.
├── DESCRIPTION
├── [data]
├── [R]
├── [analyses]
├── [outputs]
├── [syntheses]
├── my_project.Rproj
├── README.md
├── README.qmd
├── renv.lock
└── make.R
.
├── DESCRIPTION
├── [data]
├── [R]
├── [analyses]
├── [outputs]
├── [syntheses]
| ├── paper.qmd
| └── presentation.qmd
├── my_project.Rproj
├── README.md
├── README.qmd
├── renv.lock
└── make.R
Separate documents such as papers and presentations
Add useful resources (biblio, etc …)
.
├── DESCRIPTION
├── [data]
├── [R]
├── [analyses]
├── [outputs]
├── [syntheses]
├── [documents]
├── my_project.Rproj
├── README.md
├── README.qmd
├── renv.lock
└── make.R
At its most basic, this could be a plain text file that includes a short list of the names and version numbers of the software and other critical tools used for the analysis. In more complex approaches, described below, the computational environment can be automatically preserved or reproduced as well.
Place a README file at the root of the projet.
e.g. write a Rmd or qmd, and compile it in make.R.
.
├── my_project.Rproj
├── [data]
├── [R]
├── [analyses]
├── [outputs]
├── README.md
├── README.qmd
└── make.R
Should specify the computational environment that was used for the original analysis.
Use a DESCRIPTION file and the package renv for the packages!
.
├── DESCRIPTION
├── [data]
├── [R]
├── [analyses]
├── [outputs]
├── my_project.Rproj
├── README.md
├── README.qmd
├── renv.lock
└── make.R
Should specify the computational environment that was used for the original analysis.
Use a DESCRIPTION file and the package renv for the packages!
There are a number of online options to store your project.
Many are private (e.g. Dryad, https://datadryad.org/)
Zenodo (https://zenodo.org/) has been created by OpenAIRE and the CERN in 2013 and allows to upload up to 50 GO.
There are a number of online options to store your project.
Many are private (e.g. Dryad, https://datadryad.org/)
Zenodo (https://zenodo.org/) has been created by OpenAIRE and the CERN in 2013 and allows to upload up to 50 GO.
.
├── [data]
| └── raw-data.csv (raw data)
├── [R]
| └── functions.R (fonctions)
├── [analyses]
| └── pipeline.R (workflow)
├── [outputs] (results)
├── [syntheses]
| └── paper.qmd
├── [documents]
├── my_project.Rproj (projet)
├── README.md
├── DESCRIPTION (dependences, packages)
└── make.R (setup, workflow) .
├── [data]
| └── raw-data.csv (raw data)
├── [R]
| └── functions.R (fonctions)
├── [analyses]
| └── pipeline.R (workflow)
├── [outputs] (results)
├── [syntheses]
| └── paper.qmd (article, supp. mat, presentation)
├── [documents] (biblio)
├── my_project.Rproj (projet)
├── README.md
├── DESCRIPTION (dependences, packages)
└── make.R (setup, workflow) .
├── [data]
| └── raw-data.csv (raw data)
├── [R]
| └── functions.R (fonctions)
├── [analyses]
| └── pipeline.R (workflow)
├── [outputs] (results)
├── [syntheses]
| └── paper.qmd (article, supp. mat, presentation)
├── [documents] (biblio)
├── my_project.Rproj (projet)
├── README.md (help)
├── DESCRIPTION (dependences, packages)
└── make.R (setup, workflow)